Texas City
Embedding And Clustering Your Data Can Improve Contrastive Pretraining
Recent studies of large-scale contrastive pretraining in the text embedding domain show that using single-source minibatches, rather than mixed-source minibatches, can substantially improve overall model accuracy. In this work, we explore extending training data stratification beyond source granularity by leveraging a pretrained text embedding model and the classic k-means clustering algorithm to further split training data apart by the semantic clusters within each source. Experimentally, we observe a notable increase in NDCG@10 when pretraining a BERT-based text embedding model on query-passage pairs from the MSMARCO passage retrieval dataset. Additionally, we conceptually connect our clustering approach to both the Topic Aware Sampling (TAS) aspect of the TAS-B methodology and the nearest-neighbor-based hard-negative mining aspect of the ANCE methodology and discuss how this unified view motivates future lines of research on the organization of contrastive pretraining data.
- North America > United States > Montana > Flathead County > Kalispell (0.14)
- North America > United States > Florida > Broward County > Fort Lauderdale (0.04)
- North America > Canada (0.04)
- (18 more...)
- Leisure & Entertainment (1.00)
- Law (1.00)
- Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
- (6 more...)
NIST benchmarks show facial recognition technology still struggles to identify Black faces
Every few months, the U.S. National Institute of Standards and Technology (NIST) releases the results of benchmark tests it conducts on facial recognition algorithms submitted by companies, universities, and independent labs. A portion of these tests focus on demographic performance -- that is, how often the algorithms misidentify a Black man as a white man, a Black woman as a Black man, and so on. Stakeholders are quick to say that the algorithms are constantly improving with regard to bias, but a VentureBeat analysis reveals a different story. In fact, our findings cast doubt on the notion that facial recognition algorithms are becoming better at recognizing people of color. That isn't surprising, as numerous studies have shown facial recognition algorithms are susceptible to bias.
- North America > United States > Texas > Galveston County > Texas City (0.05)
- North America > United States > Oregon > Multnomah County > Portland (0.05)
- North America > United States > Oklahoma (0.05)
- (9 more...)